The Broken Promise of Semantic Versioning

There are many ideas in software engineering that sound like good ideas when you first hear them, but when held up to scrutiny, or used in a non-trivial setting, show themselves to be severely lacking. Usually this is through a kind of idealism which ignores the practical scenarios which arise in actual real world problems. I believe semantic versioning is such an idea.

For the uninitiated, semantic versioning is a versioning scheme defined by a standard in which the version number specifies a formally defined difference in the code, rather than just expressing an intent. One of the defining features of semantic versioning that I will focus on here is the fact that major version numbers are the only ones which may contain breaking changes after version 1.0.0. That is, an upgrade from 1.*.* to any other 1.*.* must not contain a breakage to the public API of the program or library.

My problem with semantic versioning can be summarised briefly as follows: a significant amount of software does not correctly follow the semantic versioning specification, and thus the claim or assumption that it does, results in a greater number of bugs and frustration than if we did away with semantic versioning entirely, forcing developers to treat upgrades with greater care.

Let's start with the first major problem; semantic versioning violations are pervasive. Some research from last year, summarised in this article, found that across more than 14,000 releases of the top 1000 most downloaded packages on crates.io, the Rust package repository, around 1 in 31 releases and more than 1 in 6 packages had at least one semantic versioning violation. As written within the aforelinked post,

Demanding perfection from maintainers would be naive, unreasonable, and unfair. Whenever hardworking, conscientious, well-intentioned people make a mistake, the failure is not with the people but in the system.

The conclusion they provide though, which involves better tooling to detect such violations, is, to my eye, questionable. The ideal of semantic versioning is that when minor security updates, performance improvements, or bug fixes occur, the consumer of the software does not need to think about or worry about breakages, and can just update their software without further consideration. Tools like Cargo will perform such an update automatically, despite not formally specify what the "public API" of a Rust package is considered to be. From the documentation,

These are only guidelines, and not necessarily hard-and-fast rules that all projects will obey... Almost every change carries some risk that it will negatively affect the runtime behaviour, and for those cases it is usually a judgement call by the project maintainers whether or not it is a SemVer-incompatible change.

In fact, the guidelines focus on the API from the perspective of the compiler, not in terms of the actual behaviour of the code.

Since the semantic versioning specification says that all "software using semantic versioning MUST declare a public API", but the package manager will perform updates automatically despite the public API being up to the individual package developer, Cargo can update a package in a way that breaks the behaviour of a program even if the developer has a well defined public API.

Moreover, it is well within my rights to turn a max function on an array into one which calculates the minimum and not release a major version, so long as I document that the behaviour is not part of the public API. Then Cargo can go right ahead and push my update to all my users unwittingly.

This shortcoming is fully admitted in the specification, which states

This is not a new or revolutionary idea. In fact, you probably do something close to this already. The problem is that "close" isn’t good enough. Without compliance to some sort of formal specification, version numbers are essentially useless for dependency management.

Well guess what? People don't adhere to this specification, and there is no way they can be guaranteed to, yet we somehow just assume that things follow the semantic versioning specification by our own ideas of what is a breaking change, leading to a problem worse than that which is trying to be solved.

The question of what constitutes a bug, or which should be part of the public API of a program or library, is one which is very much open for debate. This ought not be a problem, when developers properly document what is considered part of the public API of their code. However here we come up against idealism again. Anyone who has spent enough time in software will have come across examples of what is known as Hyrum's law, even if not recognising it by that name. It states that

With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviours of your system will be depended on by somebody.

This has been thoroughly demonstrated throughout software, most extremely with bugs that have been depended on by enough users they became features, including the undo button in Gmail, or dotfiles being hidden on Unix.

Therefore, familiarising oneself with this public API, and making sure that the only parts of the program which are depended on are specified in this API, is a necessary part of ensuring safe upgrades of your software. However, if one has to carefully check this to verify if an update is safe, why not just read the changelog? "Changelogs can be wrong or incomplete" I hear you cry. Thank you for making my point for me... just like breaking changes can make their way into minor versions. This is why a responsible software developer should actually run and test their software after performing dependency updates to verify that the behaviour is as intended.

It is a serious indictment on the failure of semantic versioning that many projects have adopted the zero versioning philosophy, where software sticks with a 0.x.x version number for its entire lifespan, such that nothing is considered stable and there is complete freedom to make breaking changes. While I do agree that allowing breaking changes is often important to produce the highest quality software, if the solution to this to always stay within the version number range in which according to the spec, all bets are off, then semantic versioning serves absolutely no purpose.

The intended solution here seems to be to go 1.0.0 and create new major versions as breaking changes occur, however ever since the Python 2 to Python 3 debacle, it seems most developers are too afraid to do so. There is so much fear around implementing breaking changes after 1.0.0, such as languages like Rust which just live with their mistakes indefinitely, that so many projects just stay 0.*.* for as long as possible, or forever, despite the recommendation on the semantic versioning website being

If your software is being used in production, it should probably already be 1.0.0.

And I don't think it's difficult to see why this fear exists. If I have a library which is 1.0.0, under the semantic versioning scheme and any sensible notion of a public API, I can't remove a single function used by 0.1% of my users without updating to 2.0.0. Some would say this is fine, just do the update, but now the version numbers of major.minor.patch express confusingly different intent from what they are supposed to, where a clearly minor update results in a major version number change.

This whole situation puts us in a weird middle-ground, where people like Andrew Kelley who runs the Zig project for a long time said that his software should not be used in production, despite getting much of its funding from companies using it in production... Now he says that people should use it in production, but it's not 1.0.0, and he has also expressed complete willingness to make breaking changes. It's almost as though three little numbers don't actually express very much about the actual state of the software.

Most damaging though is this strange obsession over version numbers. Many software developers now treat 1.0.0 as though it is the end, not the beginning, because now they feel like they are locked in forever. This can't be good for the world of software. A project that actually gets this right is swift, which has made breaking changes part of the culture of the language, and provides detailed guides on how to upgrade when breaking changes occur.

So what is the upshot of this?

For releasers of software, the solution is to go back to a simple major.minor.patch format of versions which expresses the intent of each version, rather than a formally specified description of how the changes effect users. Or even just use a YYYY.MM.DD date based scheme. The point is to stop fettishising version numbers, and just provide good documentation about the goals of the project, and provide good changelogs where breakages occur.

For consumers of software, and by extension developers depending on the code of others, the solution is to be extremely cautious of semantic versioning. Depend on one version of a package only, and treat updates with care and caution. Read the release notes of any new version to check for known breaking changes, and test your software on the new version to find any undocumented ones.